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Abstract 

This article surveys the procedures used for deriving detector transfer functions and normal- 
izing probability densities for the statistical analysis technique known as the "matrix element 
method" in the context of high energy physics (HEP) data analysis. Common misconceptions 
about transfer functions and efficiencies are pointed out and clarified. 

1 Matrix Element Analysis in a Nutshell 

The matrix element (ME) method was introduced into the HEP data analysis practice relatively 
recently, in a pioneering measurement of the top quark mass by the DO Collaboration [U [2] . The 
method can be utilized both in searches of new phenomena and in precision measurements of 
physical model parameters. It consists in calculating the probability of observing an event in a 
particle detector according to 

Pev(yIa) = ^/,P,(yIa) (1) 

i 

where y is the vector of observed quantities in the measurement space Y, a is the vector of model 
parameters (both theoretical and instrumental), and ft are fractions of different non- interfering 
production channels consistent with y. In the following discussion, it will be assumed that /« > 
for every production channel i and that the constraint J2i /i = 1 is imposed. For each channel i, 
the probability to measure y is estimated from 

P,(y|a) = f IP,(y|x,a)ei(x,a)|M,(x,a)|2T,(x,a)dx (2) 

where 

r2(y) — Indicator function for the analysis acceptance (1 for events which pass the event selection 
criteria, otherwise). This term can be replaced by 1 in case only accepted events are considered. 
X — Variables which uniquely specify a point in the channel phase space Xj, x G Xi. 
dx — Differential element of the phase space Xi. 
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crj(a) — Channel cross section: crj(a) = |Mj(x, a)p rj(x, a) dx. 
Ai{a) Overall experimental acceptance. 

VFi(y|x, a) — Detector transfer function. This is the probability density to observe detector 
response y E Y when the "true" phase space coordinate of the event is x. This function is 
normalized by Jy ^{y)Wi{y\x, a) dy = 1 for every value of i, x, and a. 

ei(x, a) — Efficiency to detect an event originated at the phase space point x (i.e., the proba- 
bility that an event originated at x actually ends up with y for which ri(y) = 1). 

|Mj(x, a)|^ — Squared matrix element of the process. 

Ti(x, a) — Other factors which do not depend on y {e.g., flux of colliding beams, parton 
distribution functions). 

The normalization condition imposed on VFj(y|x, a) ensures that Pj(y|a) is a properly normal- 
ized probability density in Y for all values of a. Indeed, 

/ Pi{y\a)dy= [ -^M^ f l^,(y|x, a) e,(x, a) |M,(x, a)p r,(x, a) dxdy. 

Jy Jy cri{a)Ai[a) Jx, 

Integrating over y first and taking into account the normalization of the transfer function, one gets 

/ P,(y|a)dy= ^ / e,(x,a) |M,(x,a)pT,(x,a)dx = = 1, 

Jy ai{a)Ai{a) Jxi Ai[a) 

where the symbol (...) stands for averaging with respect to the process density p(x|a) in the phase 
space: p(x|a) = ^^|^|Mj(x, a)p ri(x, a). With properly normalized Pj(y|a), normalization of 
Pev(y|a) is ensured as well. 

The formulation of the ME method just described assumes that the production channel fractions 
fi arc known with sufficient precision and are not of interest in the data analysis. It can be easily 
generalized for use with some densities 7r(/j|a) which represent the prior knowledge about fi as well 
as to the case in which fi are parameters to be measured. 

The ME approach offers several important advantages over all other data analysis schemes 
commonly used in HEP: 

— The method is universal and can be applied to a wide variety of particle processes for which 
theoretical models have been established. 

— The theoretical assumptions about the process under study (matrix elements M, (x, a) , chan- 
nel fractions fi, parton distribution functions) are incorporated into the data analysis in the 
most efficient manner. The whole procedure can be viewed as a Bayesian marginalization 
of the event probability over all unobserved degrees of freedom. Particle theory provides 
a well-motivated informative prior for this marginalization. 

— Some widely used data analysis methods introduce implicit assumptions about the shape of 
detector resolution functions. In particular, methods based on minimization (used, e.g., 
in kinematic fitting) assume Gaussian measurement errors, and the effect of this assumption 
on the quality of statistical modeling can not be quantified within the y^-hascd method itself. 
There is no such inherent restriction in the ME approach; very detailed and precise detector 
models can be usefully employed. 

— Maximization of the likelihood L(a) = n-f*ev(y|a) results in an efficient (in the statistical 
sense) estimate of the parameter a. Straightforward profiling or marginalization of the likeli- 
hood can be utilized in case some of the a dimensions are not of interest and can be treated 
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as nuisance parameters. In effect, systematic uncertainties (wliich are notliing else but the 
uncertainties due to imprecisely known values of nuisance parameters) are calculated on the 
event-by-event basis and, therefore, each event contributes into the overall parameter estimate 
with an optimal weight which takes into account both statistical and systematic uncertainty. 

— According to the Neyman-Pearson lemma, the likelihood ratio 



is the optimal discriminant function for channel j. In particular, when channel j repre- 
sents the signal under study, this function can be used to assign events to either "signal" or 
"background" category by testing whether rj(y[a) > cutoff. This method achieves the best 
misclassification rate among all possible classifiers with the same signal selection efficiency. 
Despite all these advantages, due to its complexity and lack of standard computational tools the ME 
approach is not necessarily the obvious first choice among various HEP data analysis techniques. 
Consider, for example, the most general case in which the integration region Xi in Eq.[2]corresponds 
to the relativistic phase space of all initial and final state particles, and the space of measurements 
Y corresponds to the complete data record produced by an experimental apparatus in a triggered 
high energy event. This general case will remain intractable for any foreseeable future because the 
dimensionality of the integral and the complexity of the integrand are just too high: thousands of 
particles can be produced in a high energy collision and then traced in a detector with millions of 
data acquisition channels. Reducing the dimensionality of Xi and Y spaces to a manageable level 
while preserving the information about parameters of interest is one of the most critical issues in 
practical applications of the ME method. 

2 A Comment on Dimensionality Reduction 

The first and the most important dimensionality reduction stage employed by all collider experi- 
ments is the process of "event reconstruction", and "jet reconstruction" in particular. In all HEP 
applications of the ME method developed so far, the final state "soft QCD" processes (parton show- 
ering and hadronization) were combined together with the detector response, and this combination 
was subsequently modeled empirically by the jet transfer functions. This particular approach is 
well motivated (at least at high c.m.s. energies such as those of the Tevatron and the LHC) by 
the QCD factorization theorem and results in a drastic dimensionality reduction of Xi which then 
becomes the phase space of parton-level quantities. In addition to jets, the event reconstruction 
procedure also produces other high-level "physics objects" (lepton candidates, decay vertices, miss- 
ing transverse energy, etc.), so the measurement space Y is typically associated with the variables 
which describe such objects. 

After this first and necessary dimensionality reduction stage, calculation of the Eq. [2] integral 
can still remain a formidable problem. For example, the leading order parton phase space of the 
reaction — >■ tt — )• W~^bW~b iv + A jets, £ = e or ^ (this is the "golden channel" for measuring 
the top quark mass at the Tevatron) is 24-dimensional^ , while the space of reconstructed quantities 

^Two initial and six final state particles are described by 32 variables, but four energy-momentum conservation 
laws and four well-known masses (of the initial partons, £ and i/) result in a trivial reduction to 24. 
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is described by ~20 variables^. Moreover, the integrand has a complex structure because of the 
resonant nature of t and W, and its efficient evaluation requires a highly nontrivial phase space 
sampling scheme. Inclusion of the processes beyond the leading order increases the phase space 
integral complexity even further. Since the number of integrals which has to be evaluated is usually 
quite large, ^ additional dimensionality reduction assumptions may be necessary for purely practical 
reasons: to simplify the reaction kinematics and to speed up the convergence of the integration 
procedures. The following ideas have been explored, in various combinations: 

— Neglect some of the production channels (especially if their corresponding event fractions fi 
are expected to be small). 

— Assume that some or all of the partons are on shell. 

— Use tree-level, leading order matrix element and phase space. 

— Further simplify the matrix element. Such simplifications might use narrow width approxi- 
mations for the resonances, ignore spin correlations in pair production, consider only valence 
quarks in the production mechanisms, etc. 

— Assume that initial partons have zero transverse momentum (hadron beams implied). 

— Assume that some event variables can be perfectly measured in the detector (i.e., transfer 
functions for these variables are represented by Dirac delta functions). 

Each of these assumptions reduces the quality of the process statistical model and results in either 
degraded precision of a parameter estimator (in case the purpose of the analysis is a parameter 
measurement) or diminished power of a signal discriminant. Naturally, assumptions of this kind 
should be avoided unless either their effect could be shown to be negligible or the practical problems 
could not be overcome by other means. A number of techniques could be employed to increase 
the calculation speed without sacrificing the fidelity of the statistical model. These techniques 
include optimization of phase space and channel sampling [3', '3] , use of multidimensional integration 
algorithms with better convergence properties than the standard Monte Carlo O EJ El [8], and 
speeding up the evaluation of the integrand {e.g., by tabulating detector transfer functions for fast 
lookup, or by factorizing the integrand into terms which depend on different parameters so that 
some terms do not have to be recalculated for every value of a) . 

3 Alternative Formulations of the Method 

The first [U [2] and some subsequent (e.g., [9]) applications of the ME approach in HEP data 
analysis utilized a somewhat different formulation of the method. Expressed with the notation 
introduced in Section (H the "extended likelihood" for the parameter a was postulated to be 

_ N 

Lext(a) = e-^'lY^-^iyl'^^^y Yl Pev(yfcla), (3) 

k=l 

^Three momentum components for £ and each jet, and transverse distance of the secondary vertex to the beam 
axis for each jet. v escapes undetected. Two additional variables should be included in the count if the missing 
transverse energy is used for event selection. 

^For example, a recent measurement of the top quark mass [T7] reports using a sample of 1,087 it candidate events, 
while the likelihood is calculated on a 32 x 26 rectangular grid of parameter values. The number of Monte Carlo 
events which has to be processed in order to calibrate such a measurement is typically couple orders of magnitude 
higher. 
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where is the number of events in the data sample, and Pev(y|a) stands for the probabihty to 
observe y without the requirement that /y Pev(y|a) dy is normahzed to 1. Although this may 
not be immediately obvious, this formula actually presumes that Pev(y|a) contains an additional 
parameter: a freely floating normalization factor. Likelihood maximization with respect to that 
factor restores proper normalization of the Pev(y|a), as illustrated in [1]. 

The utility of this particular technique is not clear: the normalization integral /y PQ^{y\a) dy is 
needed anyway in order to calculate the likelihood, and the parameter estimation requires an extra 
minimization step (typically, — InL(a) is minimized in order to maximize the likelihood). There is, 
however, a more useful extended likelihood formulation which can actually exploit the correlation, 
if any, between the observed number of events and the probability density shape. A detailed 
description of this formulation can be found in [lOj . 

A more substantial difference between Eq. [2]and the channel probability expressions used, e.g., 
in [111 [T2I [T3l |T3] consists in the explicit inclusion of the phase space efficiency term ej (x, a) in 
Eq. [2j It is indeed possible to write 

P,(y|a) = [ VF,'(y|x,a)lM,(x,a)!2r,(x,a)dx (4) 

without this efficiency term, assuming transfer functions Wl{y\x,a.) of a different kind. In or- 
der to maintain proper likelihood normalization, these transfer functions must satisfy a different 
normalization condition: /y Wj'(yjx, a) dy = 1 for every i, x, and a. It is obvious that the inte- 
grals in Eqs. [2] and [H will result in the same likelihoods calculated for an arbitrary data sample in 
case VF/(y|x, a) = Wj(y|x, a) ei(x, a). This relationship can be multiplied by ^l{y) and integrated 
over y. Together with the respective normalization conditions for Wi and VF/, this leads to an 
intuitive definition of phase space efficiency in terms of Wj'(y|x, a) and $^(y): 

e,(x,a) =^f](y)W,'(y|x,a)dy. (5) 

There is, however, a crucial difference between functions IVj(y|x, a) and iy/(y|x, a): in order to 
completely define Wi{y\x,a.), one has to know the function values only for those y for which 
i7(y) = 1. On the other hand, the functions Wl{y\x,a) must be defined for every y so that their 
normalization integrals can be evaluated. As it will be discussed in the next section, no reliable 
algorithm has been identified so far for constructing such transfer functions — they suffer from 
underspecification in the f^(y) = region. 

For the remainder of this article, the following terminology will be employed. Transfer functions 
which satisfy the normalization condition Jy Wj'(y|x, a) dy = 1 will be called "type I". Historically, 
these transfer functions were introduced first. Functions normalized by Jy ^iy)Wi{y\x, a) dy = 1 
will be designated "type 11". Finally, the "type III" transfer functions will be formally defined 
via the type II functions by the expression Wj"(y|x, a) = iyj(y|x, a) ej(x, a). These functions are 
normalized by 

^f)(yX(ylx,a)dy = e,(x,a). (6) 

Although this equation looks almost identical to Eq. [5l its logic is inverted: instead of using the 
transfer function to determine the phase space efficiency, the efficiency is used to impose the transfer 
function normalization. Type III transfer functions defined in this manner do not suffer from the 
type I underspecification problem, and these are the correct functions to use instead of inside 
the channel probability integral represented by Eq. HI 
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4 Transfer Function Modeling 



In order to obtain type II (or any other) detector transfer functions directly from data, one has 
to solve the inverse problem for VFj(y|x, a). Unfortunately, this is feasible only when the phase 
space Xi is sufficiently restricted and the background contamination is minimal. Such conditions 
can indeed be found in test beams and in certain low-background processes on e~^e~ colliders, but 
pp or pp collider environments are not conductive to this approach. Therefore, transfer function 
derivation procedures utilized so far in HEP ME analyses always relied upon a reasonably well- 
tuned detector simulation software package. For a given channel, such a package starts with x and 
provides either a simulated y or an indication that the event does not fall inside the y region for 
which f^(y) = 1. 

Computationally, it is not feasible to estimate the transfer function by simulation in the process 
of evaluating Eq. [2] integral. For modern HEP experiments, it takes a few tens of CPU seconds to 
simulate particle detector response to an event Since at least a few hundred events are needed 
to form an estimate of the probability density for each phase space point and the number of 
phase space points per integral can be about 10^, the required CPU load is prohibitive. Instead, 
one constructs the transfer functions from pools of pre-simulated Monte Carlo (MC) events by 
introducing reasonable assumptions about the transfer function properties. The main assumptions 
are: 

— To the first order, the full event transfer function can be factorized into the product of 
transfer functions for individual physics objects. Small corrections, e.g., for nearby jets, can 
be introduced on top of this factorization. 

— The transfer functions are piecewise continuous together with their first few derivatives. The 
discontinuity points are known in advance (for example, at the junctions of separate detector 
subsystems) . 

Usually, the detector response to jets is the most difficult transfer function factor to model. A signif- 
icant fraction of ME analyses performed so far attempted to construct type I jet transfer functions 
by following the "original recipe" outlined in [T]. According to this recipe, jets are generated by MC 
with the same process as the process under study (this makes sense for factorized transfer functions 
without additional corrections, as energy response for most jet reconstruction algorithms does ex- 
hibit a mild dependence on jet multiplicity). The selection of events is performed either according to 
rj(y) used for the data or according to a different i^'(y) which corresponds to modified, "relaxed"^ 
selection criteria. Simulated jets are matched to Monte Carlo partons using an angular matching 
criterion in the r]-ip space. The jet angular resolution is assumed to be perfect (modeled by Dirac 
delta functions), while a normalized probability density function (a double Gaussian is common) is 
fitted to a distribution of some energy-like jet quantity (transverse momentum or energy) using the 
same (or sometimes different) energy- like parton quantity as the explanatory variable (predictor). 
Some functional form (usually linear) is assumed for the parameters of the fitted density expressed 
in terms of the predictor. This whole procedure thus fits a nonlinear regression model in which the 
shape of the response distribution is constrained by its assumed functional form and depends on 
the predictor. 

As it turns out, this recipe has a serious deficiency and it can't possibly produce fully correct 
type I transfer functions. The problem is that the Monte Carlo events in the fits are preselected, 

^Typical for ATLAS and CMS detector simulation packages. 

^r2'(y) — 1 for every y for which fl{y) = 1. In addition, f^'(y) ~ 1 for some y for which n(y) = 0. 
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and only events with those y for which Jl(y) = 1 (or ^^'(y) = 1, depending on the details) are 
used. Thus the fitted MC events do not populate the complete Y space on which type I transfer 
function normalization is defined. Such event samples can instead be used to fit type II transfer 
functions, but the recipe calls for a wrong functional form which can't represent a sharp cutoff in y. 
Even if the sample selection criteria are relaxed as much as possible, the inherent inefficiencies 
of jet reconstruction algorithms would introduce an effective selection ^^"(y) which should not be 
ignored. 

Defective transfer functions derived according to this recipe affect probabilities of events which 
contain low-energy jets near the y cutoff but do not invalidate the overall measurement results. The 
saving grace is provided by the analysis final calibration procedure which corrects for biases and 
pulls. However, just as unnecessary dimensionality reduction assumptions, such functions degrade 
either the measurement precision or the power of the signal discriminant. 

Other methods of transfer function derivation have been developed. In Ref. [15] type I transfer 
functions were constructed by an altered procedure. The deficiency of the original recipe was noted 
and the i^'(y) = 1 region was used to fit just the shapes of the transfer function densities but not 
to fix their normalization. The transfer functions were then extrapolated into the $^'(y) = region 
devoid of events. This method results in more appropriate functional shapes and makes the transfer 
function normalizable in the whole Y space, but at the cost of introducing another problem: that of 
extrapolation ambiguity. The fatter the extrapolated tail inside ^^'(y) = 0, the smaller the function 
values become inside r2'(y) = 1 due to the imposed normalization condition. For this method, the 
effect of transfer function mismodeling is difficult to assess. Nevertheless, this is undoubtedly an 
improvement upon the original recipe. 

The authors of Ref. [16j explicitly build type II transfer functions using appropriate functional 
shapes which can model the ri(y) cutoff. However, these functions are then used as if they were 
type I, inside Eq. HI The authors demonstrate that in this case the overall probability normalization 
integral (called "observable cross section" in their paper) becomes independent of detector accep- 
tance. Instead of raising the red flag, this property appeals to the authors because detector-related 
parameters no longer enter the likelihood normalization. It should be obvious to the reader of this 
note that this "simpliflcation" of the likelihood comes at the price of an inappropriate transfer func- 
tion model which distorts the statistical description of the process and increases the measurement 
error. 

A proper construction of type HI jet transfer functions was carried out in Ref. [17J (at the time 
of this writing, this work stands as the most precise single measurement of the top quark mass) . The 
authors model both transverse momentum and angular jet response with nonparametric statistical 
techniques. They also employ a nonparametric estimate of the phase space efficiency to normalize 
their transfer functions in the ^^(y) = 1 region. The resulting transfer functions are then used to 
calculate event observation probabilities according to Eq. [H 

5 Practical Consequences 

It should be apparent from the preceding discussion that an estimate of the phase space efficiency 
ei(x, a) must be constructed in order to reliably calculate the probability Pj(y|a) to find an event 
with observables y. The calculation can utilize either type II transfer function in combination 
with Eq. [2] or type HI transfer function in combination with Eq. HI Although these approaches are 
technically equivalent, the one which utilizes Eq. [2]is conceptually easier to understand because in 
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that equation ej(x, a) has a very straightforward probabihstic interpretation. 

The ej(x, a) estimate can be obtained from the usual Monte Carlo event samples generated for 
analysis calibration purposes.^ It is important that these event samples are not filtered, as often 
done in order to conserve disk space. This is because reconstruction of the x-dependent efficiency 
denominator (e = events accepted/events total) becomes difficult or impossible when a fraction of 
events in the sample is discarded based on the value of y. 

Finally, it should be pointed out that both type II and type III transfer functions can be 
constructed using "relaxed" MC event selection criteria {e.g., in order to simplify normalization of 
the transfer functions when certain types of instrumental parameters, such as the overall jet energy 
scale, are scanned). In this case the corresponding "relaxed" phase space efficiencies should be 
derived and used inside Eqs. [2] or [H 
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